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Abstract 

To enable the resource discovery of audiovisual documents over the WWW, it will be necessary to define 
content description standards or metadata standards for complex, multi-layered, time-dependent 
information-rich audiovisual data streams. In particular, this is the primary goal of the emerging MPEG-7 
standard, the "Multimedia Content Description Interface" [i], under development by the MPEG group. In 
the past, a lot of effort has gone into generating descriptors and description schemes for video indexing 
but comparatively little research has been done on schemas capable of defining the structure, content and 
semantics of video documents and enabling validation and higher levels of automated content checking. 
This paper compares the capabilities of the RDF Schema, Extensible Markup Language p(ML) Document 
Type Definitions (DTD's), Document Content Description (DCD) and Schema for Object-Oriented XML 
(SOX), for supporting and validating hierarchical video descriptions based on Dublin Core, MPEG-7 and 
a specific hierarchical structure. Finally this paper proposes a hybrid schema based on features from each 
of these schemas which will satisfy the MPEG-7 Description Definition Language (DDL) requirements. 
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L Introduction 

To enable the resource discovery of audiovisual documents over the WWW, it will be necessary to define 
content description standards or metadata standards for complex, multi-layered, time-dependent 
information-rich data streams. In particular, this is the primary goal of the developing MPEG-7 standard, 
the "Multimedia Content Description Interface" [i], under development by the MPEG group. 

A number of papers have considered the application of Dublin Core (DC) and the Resource Description 
Framework (RDF) to video indexing [2, 3, 4, 5]. An example of such an application is described briefly 
below. However, very little work has been done on defining schemas which are capable of actually 
validating and constraining video descriptions and their associated data models. Such schemas will be 
necessary for the development of cost-eflficient, user-fi-iendly, semi-automatic metadata generation and 
editing tools for video. Such a schema would also provide a solution for the Description Definition 
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Language (DDL) component of the MPEG7 requirements. 

This paper first briefly presents a video description scheme based on Dublin Core and MPEG-7. From this 
description format, a list of schema requirements are generated. It then compares the ability of a number of 
existing schemas and schema proposals, including the RDF Schema, XML DTDs, DCD and SOX, to 
satisfy descriptions of hierarchical video structures. Examples of schema definitions are given to illustrate 
their capabilities. 

Finally this paper proposes a hybrid schema based on specific features firom each of these schemas and 
schema proposals which would satisfy the MPEG-7 Description Definition Language (DDL) 
requirements. 



2. Proposed Video Description Scheme 

Dublin Core was designed specifically for generating metadata to facilitate the resource discovery of 
textual documents. Although a number of workshops have been held to discuss the applicability of Dublin 
Core to non-textual documents such as images, sound and moving images, they have primarily focused on 
extensions to the 1 5 core elements through the use of subelements and schemes specific to audiovisual 
data, to describe bibliographic-type information rather than the actual content. 

It has been shown [2] that it is possible to describe both the structure and fine-grained details of video 
content by using the fifteen Dublin Core elements plus qualifiers and encoding this within RDF . This 
"pure Dublin Core" approach provides multiple levels of descriptive information. At the top level the 1 5 
basic Dublin Core elements can be used to describe the bibliographic type information about the complete 
document (e.g. Title, Author, Contributor, Date etc.). This enables non-specialist inter-disciplinary 
searching, independent of media type. Extensions or qualifiers to specific DC elements (Type, 
Description, Relation, Coverage) can be applied at the lower levels (scenes, shots, frames) to provide 
fine-grained, discipline- and media-specific searching (e.g. Description. Camera. Angle). The disadvantage 
of this approach is that the semantic refinement of Dublin Core through the use of qualifiers eventually 
leads to a loss of semantic interoperability. 

The alternative is a "hybrid" approach in which RDF (or some other framework) is used to combine both 
simple unqualified Dublin Core and MPEG-7 descriptors within a single description container. Dublin 
Core can be used for generic media-independent search and retrieval while MPEG-7 can be used for 
object-specific fine-grained queries. Our future research will compare and evaluate these two approaches 
for multimedia resource discovery and determine the best balance between semantic interoperability, 
extensibility and modularity. At this stage, we don't know the specific attributes of each level, we can only 
assume that each structural component will possess both a set of Dublin Core attributes plus a set of 
MPEG-7 attributes, as illustrated in Figure 1 below. 

For example, if DC.Type = "Image.Moving.TV.News.Scene" then valid descriptors will include both the 
DC simple elements plus MPEG-7 descriptors such as script, transcript, editlist, keyframe etc. If DC.Type 
= "Image.Moving.TV.News.Scene.Shot" then valid descriptors will include both the DC elements plus 
keyframe, camera_distance, camera_angle, camera_motion, opening_transition, closing_transition. If 
DC.Type = "Image.Moving.TV.News. Scene. ShotFrame" then a valid descriptors will be the DC elements 
plus colour_histogram. 

Figure 1 shows the logical structure, the structural components and their associated Dublin Core attributes 
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and some assumed MPEG-7 attributes for the proposed video description scheme. 
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Figure 1: Multilayered Hierarchical Structure and Attributes of Video 



3. Video Metadata Schema Requirements 

In order to represent the video structure and Dublin Core descriptors outlined in Figure 1 , a suitable 
schema must be able to support the following: 

• Hierarchical structure definitions. The schema must be able to constrain the structure to a precise 
hierarchy in which complete video documents sit at the top level. These in turn contain sequences, 
which contain scenes, which contain shots, which contain frames, which contain objects or actors. 
Figure 1 illustrates this hierarchy. 

• Each level (or class) within the hierarchy must be constrained to possess only specific attributes. In 
our description scheme, we assume that each layer possesses the 1 5 simple, optional and extensible 
DC elements plus a set of class-specific attributes unique to that layer. These represent the set of 
MPEG-7 descriptors for that class when they become available. 

• Element and attribute inheritance. It should be possible to specify sub-classing with inheritance of 
attributes and elements from the upper to lower classes. In addition, sub-classes should be able to 
have their own additional attributes and elements. This allows efficient reuse and customization of 
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document schemas. 

• Data Typing. It must be possible to constrain the values of attributes to certain data types. Data types 
supported should include primitive data types as well as Schemes (e.g. SMPTE), enumerated data 
types, controlled vocabularies, file types (images), URIs and complex data types (e.g. colour 
histograms, 3D vectors, graphs, RGB values etc.). It should also be possible to specify multiple 
alternative schemes or data types for a particular attribute. 

• Cardinality within attributes should be representable. It must be possible to specify that an attribute 
can have zero, one or multiple values. Ideally the minimum and maximum number of attributes 
should also be specifiable e.g. a scene must contain between 2 and 5 shots. 

• Spatio-temporal specifications. The Schema must be able to support the specification of temporal 
characteristics e.g. begin and end time of segments and their duration. Similarly, it should be able to 
support spatial representation e.g. regions within an image or motion along a line. 

• Spatial, temporal and conceptual relations. Spatial relations such as neighbouring objects and 
temporal relations such as sequential or parallel segments should be supported. Griven such a 
relationship between two classes, it should also be possible to constrain specific attribute values of 
these classes. For example, the start and end times of scenes contained within a sequence, must lie 
within the start and end time of that sequence. 

• Human-readability. It is desirable rather than mandatory that both the schema and the description 
output from the schema should be human-readable. 

• Availability of supporting technologies such as parsers (capable of validating input descriptions), 
databases and query languages. 

These requirements are similar and compatible with the DDL requirements listed in section 4.1.1 of the 
MPEG-7 Requirements Document [7]. 



4. Resource Description Framework (RDF) Schema 

The Resource Description Framework (RDF) enables interoperability between applications which 
exchange machine-understandable information on the Web. A model for representing metadata as well as 
a syntax for encoding RDF, based on XML has been defined in the RDF Model and Syntax Specification 
document [8]. 

RDF is based on a resource and property data model system. A collection of classes (typically authored for 
a specific purpose or domain) and the definition of their properties (attributes) and corresponding 
semantics represent an RDF schema. A schema defines not only the properties of the resource or class 
(Title, Author, Subject, Size, Color etc.) but also may define the kinds of resources being described 
(books, webpages, people, companies, etc.). The details of RDF schemas have been defined in the RDF 
Schema Specification document [9]. 

Classes are organized in a hierarchy, and offer extensibility through subclass refinement. This way, in 
order to create a schema slightly different from an existing one, one can just provide incremental 
modifications to the base schema. Through the sharability of schemas RDF will support the reusability of 
metadata definitions. Due to RDF's incremental extensibility, agents processing metadata will be able to 
trace the origins of schemes they are unfamiliar with back to known schemes, and perform meaningfiil 
actions on metadata they weren't originally designed to process. The sharability and extensibility of RDF 
also allows metadata authors to use multiple inheritance to "mix" definitions, to provide multiple views to 
their data, taking advantage of work done by others. The XML namespace mechanism serves to identify 
different RDF Schemas. 
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RDF schemas can be compared to XML Document Type Descriptions (DTDs). Unlike an XML DTD, 
which gives specific constraints on the syntactical structure of a document, an RDF schema provides 
semantical information about the interpretation of the statements given in an RDF data model. Given its 
goals, RDF appears to be the ideal approach for supporting descriptors from multiple description schemes 
simultaneously, as required by the MPEG-7 DDL. 



4.1 Example of a Suitable RDF Schema 

This section describes an RDF schema definition that attempts to map to the diagram in Figure 1 and 
support the requirements listed above. 

Since we want the DC simple attributes to be applicable to every component or layer, videos, sequences, 
scenes, shots, frames and objects are all sub-classes of a top level document class which possesses the DC 
attributes. In addition each sub-class has its own additional descriptive properties or attributes which will 
correspond to MPEG-7 descriptors when they become available. 
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One of the problems with RDF is to create a generic property such as contains by which the hierarchical 
structure can be defined i.e. videos contain sequences which contain shots which contain frames which 
contain objects and actors. If you create a property contains for #video then how do you also apply it to 
#sequence, #scene and #shot? Since each property requires a single range, then generic relationships such 
as contains cannot be used. Instead, a separate property must be defined for each domain-range pair. This 
is tedious and repetitive. The lack of class-specific constraints on domain and range of properties is a 
major limitation of RDF, particularly when applied to complex multilayered documents in which you want 
to specify constraints on structural, spatial, temporal and conceptual relationships between components. 
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Another problem is the limited data typing within RDF. There are three ways of specifying data types 
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within RDF: 



• Use the primitive Literal data type available within the RDF schema definition. This is any quoted 
string. 

• Implement a kind of enumerated data type by defining the range to be a class with a number of 
predefined instance values. This is used in the example below to define the possible values for shot 
transitions. 

• Point to a separate namespace in which the data types have been defined. In the example below we 
refer to "http://www.w3.org/TR/datatypes" for any data types other than literal. This namespace 
doesn't currently exist but it is intended to define this within the W3C XML Schema Working Group 
[ 10 ] which has recently been set up. 

Below is an example of the RDF Schema code defining some of the scene, shot , frame and object 
properties. It illustrates the three data typing methods available. 
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4.2 Advantages of the RDF Schema for Video Metadata 



RDF Schemas, within the context of this application, have the following advantages: 

• RDF Schema is able to provide meanings to elements or semantic structure not possible using purely 
syntactic schemas such as XML DTDs.. However the sorts of machine-understandable meanings 
provided in the current version of RDF Schema is very limited - so the advantage of "semantic 
validation" is virtually negligible. 

• The other schemas really only provide implicit child or contains relationships between elements. 
With RDF you can specify any relationship types explicitly through properties but this is limited by 
the need to specify a single range. It isn't possible to constrain a particular relationship to multiple 
range/domain pairs e.g. sequences can only contains scenes which can only contain shots etc. 

• Multiple namespaces. This enables the same feature to have different descriptors which correspond 
to different domains or description schemes. The ability to mix classification vocabularies within one 
XML-based encoding allows video authors or others to deliver richer domain-specific content 
descriptions thus increasing the re-usability of the video on the Web. This is a key requirement of the 
MPEG-7 DDL. 

• Inheritance is supported through sub-classes and sub-properties. This provides easy extensibility and 
reuse of code. 

• A simple RDF parser (SiRPAC [11 ] ) exists but it has limited validation capabilities, checking only 
that the domain and range constraints are satisfied. 

• It is human-readable, simple to understand and thus simple to extend or customize. 



4.3 Limitations of the RDF Schema for Video Metadata 

RDF Schema has the following problems or limitations: 

• Unstable. The RDF Schema specifications are still under development and change fi-equently. 

• Limited or no data typing. Almost all data typing will need to be provided by external namespaces, 
which don't yet exist. 

• No cardinality. It isn't possible to specify optional, zero or multiple values for an attribute. 

• Range constraints such as minimum and maximum values are not supported. 

• Class-specific range constraints are not possible. Only one range is possible for a given property. The 
only way to do provide multiple ranges is to create multiple properties e.g. secs_start_time, 
fi*ame_start_time, SMPTE_start_time. 

• RDF Schema can't describe multilayered structures using a single generic "contains" property. This 
requires multiple specific "contains" properties i.e. "contains_sequences", "contains_scenes", 
"contains_shots", "contains_fi'ames". The alternative is to implement code outside of the schema 
which understands DCRelation.HasParts semantics and can perform the validation. 

• Property-centricity makes readability difficult. The link between properties and classes is defined 
within the property definitions not the class definitions. 

• No query language exists for RDF. Given a video structure, to find videos with similar structures, you 
need to be able to store RDF structures in a directed graph with associated attribute values in a 
database. 

• The simplest way to specify spatial and temporal relationships is via the Collection elements: Seq, 
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Bag and Alt, but these provide limited semantics. Since no <Par> element exists within RDF, the 
<Bag> element must be used to specify parallelism. For spatial relationships such as neighbours, if 
the list of neighbours is in a collection, can we assume that the first one is the nearest neighbour? 

• Cannot map relationship-type properties between classes to constraints on the attribute values of the 
classes involved. For example, if two scenes abutt then their respective end and start firame numbers 
must be consecutive. If a sequence "contains" a scene, then the start and end times of the scene, must 
lie within the start and end times of the sequence. This is not supported by RDF Schema. 

• RDF Schema is an incomplete mapping of the RDF Syntax and Data model. There are very useful 
features available within the RDF Syntax and Data Model Spec, which aren't supported in the RDF 
Schema. 



5. XML DTDs 

Extensible Markup Language pQVEL) Document Type Definitions (DTDs) provide a subset of SGML for 
describing documents. XML was developed by the XML Working Group under the World Wide Web 
Consortium (W3C) in 1996. The complete XML spec, is available fi-om the W3C.[i2]. 

Each XML document has both a logical and a physical structure. Physically, the document is composed of 
units called entities. An entity may refer to other entities to cause their inclusion in the document A 
document begins in a "root" or document entity. Logically, the document is composed of declarations, 
elements, comments, character references, and processing instructions, all of which are indicated in the 
document by explicit markup. The logical and physical structures must nest properly. 

The function of the markup in an XML document is to describe its storage and logical structure and to 
associate attribute-value pairs with its logical structures. XML provides the document type declaration, to 
define constraints on the logical structure and to support the use of predefined storage units. An XML 
document is valid if it has an associated document type declaration and if the document complies with the 
constraints expressed in it. Document type declarations are made in a Document Type Definition (DTD) 
file. The DTD file then contains a formal definition of a particular type of document outlining the element 
names and the structure of the document. 



5.1 An Example of an XML DTD for Video Documents 

The structure is defined in the element definitions at the top of the DTD. Each element has a set of 
associated attributes. All elements have an ID attribute plus the DC attributes. In addition, sequences, 
scenes and shots also have a set of time attributes (begin, end, duration). Each element also has its own set 
of level-specific attributes (which will correspond to the MPEG-7 descriptors when they become 
available). 
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5.2 Advantages of XML DTDs for Video Metadata 

• Work is progressing on a query languages for XML e.g. XML-QL [13 ]. 

• XML parsers exist. 

• Simplicity associated with a single namespace. Users only have to understand one namespace. 

• XML is simpler than SGML, HyTime etc. 

• XML DTDs are easy to read and understand. Short and sweet without all that data typing. 

• Hierarchical structures are supported but only on a syntactical basis. 



5.3 Disadvantages of XML DTDs for Video Metadata 

♦ No name spaces. Since name spaces are not supported, definitions such as Dublin Core attributes will 
need to be redefined unless external entities are used. External entities provide a similar capability to 
namespaces. An external entity can be retrieved firom an external DTD via a URI to this DTD and the 
entity's ID. 
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• Cardinality of attributes is zero or one in XML DTDs. This creates problems with DC attributes 
which are optional and repeatable. They may need to be declared as elements. 

• There is very limited support for data typing. Only three kinds of attribute types are supported: a 
string type, a set of tokenized types and enumerated types. However Bray [14] has shown that it is 
possible to attach strong type declarations to XML elements using reserved attributes. 

• It is a purely 'syntactic* machine-understandable schema which can't provide any of the semantics 
associated with complex structured multimedia data or support object-oriented data modelling 
concepts. 

• There is no inheritance. 

• There are no relationships possible other than the implicit contains. 



6. Document Content Description (DCD) for XML 

The Document Content Description (DCD) [ 15 ] facility for XML is an RDF vocabulary designed for 
describing constraints to be applied to the structure and content of XML documents. It consists of a set of 
properties used to constrain the types of elements and names of attributes that may appear in an XML 
document, the contents of the elements and the values of the attributes. It was designed to provide 
semantics over and above the purely syntactical XML DTDs. It was also designed to be conformant with 
the RDF Model and Syntax Specification (with some simplifications). DCD also incorporates a subset of 
an earlier submission to W3C, the XML-Data Submission [16]. 

The introduction to the XML-Data Submission says that it "describes an XML vocabulary for schemas, 
that is, for defining and documenting object classes. It can be used for classes which are strictly syntactic 
(for example, XML) or those which indicate concepts and relations among concepts (as used in relational 
databases, KR graphs and RDF). The former are called 'syntactic schemas;' the latter 'conceptual 
schemas'." Thus, XML-Data and DCD add object-oriented and data modelling concepts such as class 
inheritance to purely syntactic schemas such as XML DTDs. 

DCD Schemas are based on elements and attributes. Elements correspond to RDF property types. DCD 
declarations constrain the content and attributes of elements in document instances, by assigning 
properties to objects of type ElementDef and AttributeDef. 



6.1 Example of a DCD Schema 

The DCD Schema below is based on the following assumptions: 

• The Dublin Core elements are all described in a separate name space. 

• The root element video doc contains video ^sequences which contains videoscenes etc. 

• The Dublin Core elements apply to every level. 

• In addition the sequence, scene and shot elements possess start time, end time and duration 
elements. 

• In addition, each level has its own unique elements/attributes corresponding to MPEG-7 descriptors. 
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6.2 Advantages of DCD for Video Metadata 



• Human-readable and simple. 

• Provides better data typing than RDF Schemas and XML DTDs (but still only basic). Also provides 
upper and lower bound constraints on attribute values. 

• Provides cardinality. 

• Supports multiple namespaces. 

• As an RDF vocabulary, it inherits the advantages of the data modelling concepts in RDF, plus 
constructs such as RDF:Seq and RDF:Alt. 



6.3 Disadvantages of DCD for Video Metadata 

• Currently no subclassing or inheritance but this is planned for the future. The proposal is to create 
subclasses from existing elements through an extends property. 

• Only basic data typing is supported , not complex data types. There is no support for multiple 
alternate data types, except if you create alternate elements with different data types e.g. start_time 
value can be SMPTE, sees, frames(int). Also there is no support for constraining the values of certain 
attributes of related elements. 

• Doesn't support data types such as points, lines, polygons, colour histograms etc. These would all 
have to be described in a separate namespace e.g. "http://www.w3.org/TR/complex_datatypes". It is 
not possible to specify that just the element's datatype is to be a value from another namespace. You 
need to specify that the element itself is totally described in another namespace. 

• Only Seq and or Alt Groups are available. Bag is not a legal value for the RDF:Order property. Seq is 
fine for specifying sequential components but for multimedia, there is also a need to support groups 
of elements which run in parallel. The RDF Bag element is the most suitable for specifying this, (in 
the absence of any Par value), but it isn't supported in DCD. 



7. Schema for Object-Oriented XML (SOX) 

Schema for Object-Oriented XML (SOX) [ 17 ] provides a facility for defining the structure, content and 
semantics of XML documents to enable XML validation and automated content checking. 

SOX provides an alternative to XML DTDs for modeling markup relationships. The introduction to the 
SOX specification says that it provides the following advantages over XML DTDs: 

• More efficient software development processes for distributed applications; 

• Basic intrinsic datatypes; 

• An extensible datatyping mechanism; 

• Content model and attribute interface inheritance; 
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• A powerful namespace mechanism; 

• Embedded documentation. 

SOX supports three varieties of datatypes: scalar datatypes^ enumerated datatypes and format datatypes. 
Scalar datatypes are derived from the basic number datatype, and support specification of the number of 
digits and decimal places, minimum and maximum value range, and a mask. An enumerated datatype may 
be derived from any of the intrinsic datatypes, and may specify an enumeration of valid values. A format 
datatype may be derived from any of the intrinsic datatypes, and must specify a mask. 

In SOX, element types may inherit their content models and attribute definitions directly from another 
named element type. An element type may also inherit and extend an attribute list Specialization of 
attribute definitions allows refinement and restriction of attribute datatype, enumeration list and default 
value. Additionally, an attribute value may be defined to be inherited from the identically named attribute 
on a parent or older ancestor element. Thus, for example, namespaces can be inherited from superordinate 
elements. 

The SOX namespace facility enables Objects from any identifiable namespace to be used in building a 
SOX document. That is, any element, attribute, datatype, enumeration, entity, interface, notation, 
parameter, or processing instruction may be imported from any namespace. 

A SOX document is a valid XML document, according to the SOX DTD. The schema designer is free to 
employ the same XML tools used for traditional XML documents. This means that a SOX document can 
processed by a validating XML parser, formatted according to an XSL stylesheet, and managed by any 
DOM-compliant or SAX-compliant application. 



7.1 SOX Example 

In this example, the structural elements, video_doc, video_sequence, video_scene, video_shot, 
video_frame and video_object are declared first. They each possess the DC attributes, plus their own 
specific elements and attributes. 
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The next step is to break down the elements to sub-elements and eventually data types. SOX supports both 
intrinsic basic datatypes as well as user-defined scalar, enumeration and formatted datatypes, derived fi-om 
the intrinsic datatypes. The code below illustrates some of the capabilities of SOX data typing for video 
description. 
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7.2 Advantages of SOX for Video Metadata 

• XML query languages, when available, will work on SOX documents. 

• XML parsers will work on SOX documents. In order to perform complete validation of SOX-specific 
constraints, extra parsing code will be required. 

• Provides much better data typing capabilities than the other schemas - scalar, enumerated and 
formatted data types. 

• SOX provides the best cardinality with "occurs n,m". 

• Inheritance provides the possibility for reuse of code, elements and datatype definitions. Elements 
can inherit their definitions fi'om existing elements (using instanceoj) and also extend with new 
attributes (using extends). 



7.3 Disadvantages of SOX for Video Metadata 
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• SOX was designed for validating business documents in e-commerce applications. Consequently it is 
more suitable for validating static forms than complex multimedia structures. It is element-focussed 
rather than entity-focussed. 

• It only provides the implicit "contains" relationship. Structural constraints would require multiple 
"contains" elements. 

• Inheritance is possible but users can only extend elements with new attributes. Ideally one should be 
able to extend elements with both new attributes and new elements, 

• Not an RDF vocabulary. 



8. Conclusions: the Ultimate Schema 

None of the above schemas is ideal for describing complex multimedia documents. They all satisfy some 
of the requirements but fall down in other areas. None of them is designed for describing complex 
hierachical structures in which there are spatial, temporal, structural and conceptual relationships between 
the components and where these relationships map to constraints on the relative attribute values of the 
related components. For example, spatial relationships such as neighbours, in-front-of, behind, 
overlapping and surrounds correspond to certain constraints on the values of the shape, location or 
volume attributes of the related objects. Similarly temporal relationships such as contains, sequential, 
parallel and overlapping should be mapped to constraints on the start time, end time and duration of the 
components. None of the schemas support these capabilities. 

RDF Schema claims to differ from the other schemas, in that it is not a "syntactic" schema but a "semantic 
schema". However the sorts of machine-understandable meanings provided in the current version of RDF 
Schema is very limited. So the advantage of "semantic validation" is negligible. RDF Schema is good at 
containing and combining descriptors from different name spaces/communities but it has virtually no data 
typing. Data types must be defined in a separate name space. This has yet to be done but the work is 
intended to be done within the W3C XML Schema Working Group [10 ] which has recently been set up. 
RDF Schemas also don't easily support multi-layered hierarchical structures because of the inability to 
specify generic relationship types using properties and to apply these across multiple domain/range pairs. 
So although RDF is better than the other schemas because of its ability to specify relationships other than 
the implicit child or contains relationship (which is the only one that the other schemas offer), this facility 
is limited to a specific range and domain due to the lack of "class-specific constraints". 

XML DTDs offer simplicity and fast, cost-effective development due to the ready availability of parsers, 
tools and applications. However, as a data modelling language, they have limited semantics, which 
XML-Data, DCD and SOX schemas try to expand by adding things such as strong data typing and lexical 
constraints. 

DCD is an improvement on XML DTDs because it provides better data typing and also provides 
additional semantics via its RDF conformity. However it doesn't currently support inheritance - although 
this is a future goal. 

SOX has the best data typing. It is also XML compliant so that XML parsers and XML-QL (when it 
becomes available) will work on it. It supports inheritance but with attribute extension only, not element 
extension. It is not RDF-conformant. SOX also provides the best cardinality - enabling the minimum 
(other than 0 or 1) and maximum number of children of an element to be specified e.g. maximum of 10 
shots per scene. DCD and XML DTDs can only specify zero or more or one or more children. 
RDF-Schema doesn't support cardinality. 
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An additional desirable schema feature would be the ability to define equivalence relationships between 
attributes and define constraints based on these relationships. For example, suppose there are two 
attributes, ColourText and ColourHistogram. Then in an ideal schema, users would be able to define an 
enumerated data type for ColourText (red, yellow, green, blue etc) and for each of these possible values, a 
correponding permissable range of ColourHistograms would be defined. An even more complex example 
is the mapping of a textual description attribute to some combination of shape, colour and texture attribute 
value ranges. Such a schema could then be used to both validate the integrity of the input data but also 
automatically generate metadata where it is not provided. This ability to map from high-level features 
(such as text) to low level features (colour, shape, texture) is one of the requirements of the MPEG-7 
DDL. It would also greatly improve the searchability of complex multimedia archives. None of the 
schemas examined provide such sophisticated capabilities. 

Other relevant schemas not covered in this paper include: the XSchema specification [18] and XML-Data 
[16 ]. XSchema is very similar to SOX. Formerly known as XSD, XSchema began as a proposal for the 
representation of XML DTDs as XML documents. The advantages of using XML document syntax to 
describe XML document structures include the ability to browse and edit XSchemas using XML-aware 
tools. This can*t be done on DTDs which are not pure XML documents. Although XML-Data has some 
very useful features, it appears to have been superseded by DCD. 



8.1 The Ideal Schema for Multimedia 

Based on the above analysis and comparisons, the best solution for video metadata representation is one 
which provides the object-oriented, semantical concepts of RDF but expresses them within an 
easily-understood, human-readable XML schema. We have proposed such a XML schema for the 
MPEG-7 DDL [19] which provides the following features: 

• The semantics and object-oriented concepts of inheritance provided by RDF through classes, 
sub-classes^ properties and sub-properties, 

• The extensible data typing capabilities of SOX; 

• The addition of a relation entity which allows spatial, temporal, structural and conceptual relations to 
be defined between classes and constraints on the domain, range and property values to be specified; 

• The XML namespace facility; 

• Cardinality; 

• Temporal and spatial controls and specifications; 

• Linking mechanisms which enable links between descriptions and between content and descriptions; 

The two major problems associated with this proposal are that it constitutes yet another schema or 
language and there are likely to be quite complex extensions necessary to the basic XML parser in order to 
perform complete validation of all of the constraints. 

The W3C XML Schema Working Group [iO] is looking at a XML-based schema language which provides 
support for data typing and structural constraints, currently lacking in XML DTDs. Their charter includes 
delivering a recommendation on the best combination of DCD, XML-Data, SOX and RDF for validating 
document syntax. Based on the XML Schema Requirements document [20], there is a very real possibility 
that the schema which they develop will satisfy the majority of the MPEG-7 DDL requirements. 
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